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lhe  contractors  nave  concentrated  their  efforts  on  the  design  of  appro* imation  techniques 
m  nonlinear  filtering 


'he  prop i«ni  s  '■oughly  as  follows  { Xt  >  being  an  unobserved  diffusion  process 
supoose  that  •>*  observe  the  process  (Vt )  given  try 

Yt  *  f  h  '  V®  *  **t 

0  % 

•here  n  >s  one  to  one  {U^'sthe  observation  no'se"  and  ns  a  small  parameter  After  the 


mtlit  wort  of  ftobrovsxy  »  atfur -Schuss  J  Picard  has  rigorously  proved  in  [9]  that  one  can 
design  approximate  filters  wnose  difference  with  the  optimal  filter  is  of  arpitrory  order  in  c. 
Recently  j  Ricara  [  10]  has  tmprovad  his  result  in  the  sense  that  he  does  not  assume  anymore 
that  the  initial  law  of  has  a  density  The  new  mathematical  tool  which  made  this  improvement 
possible  is  the  stochastic  calculus  of  variations  which  is  a  pranch  ut  the  so-called  'ttallievin 
calculus  Or  the  other  hand.  *  Bensoussan  [2]  has  given  a  purety  analytic  proof  of  the  first 
version  or  Picards  results.  thus  avoiding  severe'  delicate  technical  tools  from  the  theory  of 
stochastic  processes 


Two  news  x ot set s  have  been  initiated  on  this  suDiect  and  will  be  reported  on  with  more 
details  m  the  next  r sports 

e  £  Pardoui  studies  in  collaboration  with  w  flaming  (Brown  university.  nSA)  the  case 
where  h  is  only  locally  one  tc  one 

0  Paula  Milheiro  ( stutknt  jfE  Pardoux)  is  making  some  numerical  tests  on  the  Picard 
niters' 


Consider  a  nonlinear  filtering  xoblem 

«t.b(x1)«..(xt)<wf,  8  8  2  09  14 1 

<jr(  -  MX,)®  *  <#t 


where  (Ut)  and  (Ut)  are  standard  Wiener  processes,  {Xt )  and  {Yt)  are  supposed  for 
simplicity  to  be  one  dimensional ,  {X^ }  belno  unobserved  and  ( }  observed. 

We  assume  that  we  can  partition  R  Into  a  finite  union  of  disjoint  Intervals  (I  j ,  ...,ln)  In 
such  a  way  that  on  each  of  the  ij's  b  and  h  are  linear,  and  a  is  constant.  We  can  naturally 

associate  to  the  above  nonllneer  filtering  problem  n  linear  filtering  problems  Suppose  we  start 
the  Kalman  filter  corresponding  to  the  i-th  linear  filtering  problem  with  the  nongausslan  Initial 
law  which  Is  the  restriction  to  l|  of  the  law  of  X0  During  a  "small”  Interval  of  time  h,  most  of 
the  "mass”  stays  in  lj,  so  that  we  make  a  small  error  by  running  the  linear  filter.  The  n- 1  other 
Kalman  filters  are  working  similarly  In  parallel  At  time  h,  the  output  of  the  n  linear  filters  are 
summed  up,  and  the  sum  is  split  according  to  the  partition  { I  j , . . . ,  ln)  ,  which  gives  the 

initial  laws  for  the  n  Kalman  filters  which  run  in  parallel  on  the  Interval  [h,  2h] ,  etc... 

C  Savona  ( student  of  E  Pardoux)  has  proved  in  her  thesis  [11]  (see  also  [8]  )  that  the 
output  of  this  procedure  convergences  to  the  optimal  filter  as  h  — >  0.  More  recently,  she  has 
tested  numerically  this  procedure.  The  first  results  are  deceiving  on  some  of  the  examples,  in 
the  sense  that  It  seems  necessary  to  choose  h  very  small ,  for  the  result  to  be  reasonably  good. 
This  point  will  be  checked  again  in  the  near  future. 


3  -  Numerical  solution  of  ZakaT's  equation. 

F.  Le  Gland  [6]  has  studied  in  great  detail  the  problem  of  the  time-discretization  of 
Zakaf's  equation.  He  suggests  In  particular  a  new  scheme,  whose  error  is  of  the  order  of  (At)372 
If  At  is  the  time-discretization  step.  An  original  probabilistic  Interpretation  of  the  latter 
scheme  Is  provided. 


4  -  Parameter  estimation  for  partially  observed  stochastic  processes. 

Fabien  Camplllo  and  Francois  Le  Gland  [5]  have  compared  the  EM  algorithm  ( proposed  in 
the  context  of  partially  observed  stochastic  processes  by  Dembo  and  Zeltounl)  with  the  standard 
maximum  likelihood  approach,  which  consists  in  maximizing  the  Integral  over  the  whole  space 
of  Zakal's  equation.  The  EM  algorithm  seems  at  first  sight  to  be  more  efficient,  but  requires  a 
great  deal  of  memory,  since  It  uses  a  smoothing  algorithm  (vs.  Altering).  Also  the  number  of 
iterations  required  has  to  be  checked  In  practice.  A  numerical  comparison  will  be  done  In  the 
near  future. 


Nonlinear  filtering,  which  Is  a  stochastic  theory,  has  a  deterministic  counterpart,  which 
is  the  theory  of  “dynamic"  observers.  The  object  of  this  theory  is  to  give  a  way  of  reconstructing 
the  solution  of  a  given  differential  equation  with  unknown  Initial  condition,  from  partial 
observations.  There  are  obvious  connections  between  the  theory  of  filtering,  and  the  theory  of 
observers.  One  of  the  Issues  of  the  latter  is  the  question  of  observability,  which  is  also  an 
Important  practical  Issue  in  filtering 


A.  Bensoussan,  J.  Baras  and  M.  James  [4]  have  shown  that  a  d/namic  observer  can  be 
viewed  as  the  limit  of  stochastic  filters,  when  the  intensity  of  the  noises  tends  to  zero,  A. 
Bensoussan  and  J,  Bras  [3]  have  also  studied  observers  for  systems  governed  by  PDEs. 


II)  -  TRANSFER  FROM  FRANCE  TO  THE  U.S. 


A.  Bensoussan  has  given  a  series  of  "distinguished  lectures”  at  the  Systems  Research 
Center  of  the  University  of  Maryland  in  November  of  1 986 ,  on  nonlinear  filtering  and  stochastic 
control  with  partial  observation. 

E.  Pardoux  has  given  in  March  1987  a  series  of  lectures  in  the  same  framework,  on  the 
applications  of  the  Malllavln  Calculus,  in  particular  to  nonlfnear  filtering.  The  Malllavln 
Calculus  is  a  new  branch  of  stochastic  analysis,  which  has  been  developped  essentially  in  France, 
the  U.S.  and  Japan,  by  theoretical  probabilists.  This  new  tool  has  proved  to  have  Important 
applications  in  filtering,  and  its  popularization  among  applied  probabilists  is  now  an  important 
task. 
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Introduction 

On  propose  une  methode  de  resolution  approthee  pour  la  tlasse  des  problemes  de  fil¬ 
trage  “linAaires  par  morceaux”:  le  signal  -c  est  solution  d'une  equation  differentielle 

slochastique  dont  les  coefficients  de  derive  et  de  diffusion  sont  respectivenienl  lineaire  par 
morceaux  et  constant  par  morceaux  non  degenere,  on  observe  un  processus  {)'.}.  de  la 
forme 


avec  h  continue  lineaire  par  morceaux,  {Bt}tjo  rnouvement  brownien  independant  du 
signal  et  on  cherche  A  caracteriser  la  loi  conditionnelle  du  signal  A'r  sachant  la  tribu  des 
observations  jusqu'A  l'instant  T  pour  tout  T  >  0  (le  “(litre”). 

Pour  un  probleme  de  filtrage  lineaire  avec  condition  initiale  gaussienne,  la  loi  con- 
ditionnelle  du  signal  sachant  lea  observations  est  une  gaussienne  dont  la  movenne  et  la 
variance  sont  solutions  respectivement  d’une  equation  differentielle  stochastique  et  d'une 
Equation  differentielle  ordinaire  de  type  Riccati,  construites  sur  I'observation  (filtre  de 
Kalman- Bucy).  En  revanche  on  ne  sait  en  general  pas,  pour  un  probleme  de  filtrage  non 
lineaire  quelconque,  mettre  en  evidence  un  ensemble  fini  de  statistiques  suffisantes,  solu¬ 
tions  d’un  systAme  rAcursif  de  dimension  finie  construit  sur  l’observation,  permeltant  de 
calculer  le  filtre  (si  c’est  le  cas,  on  dit  que  le  problAme  de  filtrage  est  de  dimension  finie);  la 
resolution  directe  d’un  problAme  de  filtrage  non  lineaire  conduit  done  sauf  cas  particulars 
A  un  algorithme  de  dimension  infinie.  BeneA-Karatias  etudient  par  exctnple  dans  |2“  le 
probleme  de  filtrage  linAaire  par  morceaux  dans  le  cas  d’un  signal  de  dimension  1  dont  les 
coefficients  de  dArive  et  de  diffusion  sont  de  plus  respectivement  eor.tinu  et  constant.  11s 
obtiennent,  en  utilisant  des  techniques  classiques  de  construction  de  la  solution  fondamen- 
tale  d’une  Aquation  aux  derivAes  partielles  de  type  paraboiique,  une  representation  de  la 
densitA  conditionnelle  A  partir  d’un  nombre  fini  de  statistiques  suffisantes;  une  partie  de 
ces  statistiques  est  solution  d’un  systAme  rAcursif  de  dimension  finie  mais  I’autre  est  solu¬ 
tion  d’un  systAme  d’Aquations  intAgrales  (ces  deux  parties  correspondent  respectivement 
aux  intervailes  de  linearitA  et  A  la  prise  en  comple  des  points  anguleux  des  coefficients). 


I 


Dans  (7j,  on  a  mi*  en  Evidence  une  suite  de  (litres  sous-optimaux  pour  le  probleme  de 
filtrage  lineaire  par  morceaux,  obtenua  en  discr^tisant  le  temps  et  en  exploitant  le  caractere 
lineaire  par  morceaux  des  coefficients,  et  on  a  ytabli  la  convergence  de  ces  filtres  vers  le  fiitre 
optimal.  On  va  exploiter  ici  ce  resultat  de  convergence  et  montrer  comment  le  probleme 
de  nltrage  lindaire  par  morceaux  peut  etre  resolu  de  fa^on  approchee  par  le  calcul  d'une 
batterie  de  filtres  iindaires  avec  condition  initiale  non  gaussienne;  ceux-ci  sont  calcules  a 
I’aide  de  l'algorithme  propose  par  M allows ki  dans  [6], 

Enfin,  signalons  que  Di  Masi-Runggaldier  dtudient  en  [3],  |4j  le  probleme  de  filtrage 
lin^aire  par  morceaux  en  temp*  disc  ret  Dans  j4]  its  proposent  un  fiitre  de  dimension  finie 
qui  l’“approche*  en  ce  sens  que  les  moments  conditicnnels  pour  le  probleme  lineaire  par 
morceaux  et  pour  ce  fiitre  de  dimension  finie  convergent  vers  la  meme  limite  lorsque  les 
variances  de  la  loi  initiale  et  du  bruit  du  signal  tendent  vers  0.  Dans  [3],  ils  traitent  le  cas 
particular  ou  les  coefficients  du  signal  sont  constants  par  morceaux;  la  loi  conditionnelle 
est  alors  combinaison  lineaire  de  K  gaussiennes:  la  moyenne,  la  variance  de  ces  gaussiennes 
et  leur  nombre  K  sont  fonction  des  coefficients  du  signal,  les  coefficients  de  la  combinaison 
lineaire  se  calculent  de  fa?on  recursive. 

Dans  §1  on  donne  la  formulation  du  probleme  de  filtrage  lineaire  par  morceaux,  des 
probl^mes  de  filtrage  approchis  et  on  rappelle  le  resultat  de  convergence  etabli  en  7j 
puis  on  etudie  en  §2  un  algorithme  de  resolution  approchee  pour  ce  probleme.  Les  courbes 
representant  la  density  conditionnelle  pour  un  exemple  numerique  sont  donnees  en  annexe. 

Sota.tions.  On  note  Ct(IRd)  l’ensemble  des  fonctions  continues  bornees  sur  JRd,  on  fixe 
un  temps  terminal  T  et  on  note  Cd  l’ensemble  des  fonctions  continues  dyfinies  sur  10,  T\ 
a  valeurs  dans  IR1*,  Cd  l’ensemble  des  yiyments  de  Cd  qui  sont  de  plus  nuls  en  0;  si  X 
est  un  processus  aleatoire  defini  sur  Cd  muni  de  sa  tribu  borelienne,  on  convient  de  noter 
(7*),>o  sa  filtration  naturelle. 

1.  Formulation  du  probleme  de  filtrage  et  de  ses  approximations 

Soit  {Pk,  1  <  k  <  /C}  une  partition  finie  de  IR1*  oil  les  Pk,  k  =  1 sont  des 
polyedres.  Soit  b  et  h  deux  applications  de  IK.1*  respectivement  dans  IR1*,  E V ,  affines  sur 
chacun  des  polyedres  de  la  partition  {Pk,  1  <  k  <  K},  h  etant  de  plus  supposee  continue. 
Soit  <7i , . . .  ,  ok  K  matrices  d  x  d  non  degenerees  et  o  la  fonction  prenant  la  valeur  <7*  sur 
Pk-  Enfin,  on  designe  par  6*  (resp.  fi*)  la  fonction  affine  de  Et1*  dans  1R1*  (resp.  dans  IR7) 
qui  coincide  avec  6  (resp.  h)  sur  Pk- 

Soit  fl  =  Cd  d’416ment  gdnirique  u,  {Xt,  t  e  [0,7’]}  le  processus  canonique  sur  f)  et 
tto  une  loi  de  probability  sur  ]Rd  absolument  continue  par  rapport  k  la  mesure  de  Lebesgue 
de  density  po  admettant  des  moments  exponentiels  de  tous  ordres.  D’apres  Krylov  [o| ,  le 
probiyme  de  martingales  associy  &  liquation  differcntielle  stochastique 

dx,  =  b(x,)dt  +  o[xt)dwt,  t  e  lo,  rj  (1) 

admet  une  solution;  la  fonction  o  ytant  de  plus  non  dygenyr-'e  et  constante  par  morceaux 
sur  une  famille  finie  de  polyedres  de  IR1*,  on  a  unicite  en  loi  des  solutions  de  (1)  d’apres 
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Bass-Pardoux  jl].  On  peut  done  ccnsid6rer  P,  unique  solution  sur  (ft,  7X)  du  probleme  de 
martingales  associe  a  (1)  telle  que,  sous  P,  ir0  est  la  loi  du  vecteur  aleatoire  Xo-  L'esperance 
par  rapport  a  P  est  notee  IE  et  pour  s  6  |0,  T],  x  6  JRd,  JEtiI  designe  l’esperance  par 
rapport  &  la  loi  du  processus  solution  de  (1)  avec  la  condition  X,  =  x. 

Le  probleme  de  filtrage  lindaire  par  morceaux  P  est  le  probleme  de  filtrage  avec  un 
signal  {Xt,  t  6  JO,  T}}  continu  4  valeurs  dans  IR.d  de  loi  P  et  un  processus  observe 

y,  =  f*  h(X.)ds  +  Bt,  t  e  |0,T] 

Jo 

avec  (Bt,  t  6  (0,7’]}  mouvement  brownien  independant  du  signal.  Rappelons  comment 
ce  probleme  est  construit  par  la  m^thode  de  la  probability  de  reference.  Soit  (AT,K)  le 
processus  canonique  sur  Cd  x  Cj  muni  de  la  loi  de  probability  rf1  =  P  ®  M> ,  M>  etant  la 
mesure  de  Wiener  sur  C’  (if5  est  la  probability  de  reference).  Sur  Cd  x  C£  on  definit  le 
processus 

Zt(uj,y)  =  exp  i  J  |h(X,(u>)) |2ds| ,  t  6  [0,T] 

oil  pour  tout  uj  e  ft,  /((oj,  •)  est  Ml  indistingable  de  (’integrate  stochastique  fQ  h(X,(u>))dY,. 
Les  hypotheses  faites  sur  b,  a  et  h  assurent  que  { Zt}t  est  une  (?tX,y  ,&)  martingale.  Soit 
maintenant  P  la  loi  de  probability  sur  Cd  x  Cjj  definie  par 


-  Zt- 

Tx.y 

Resoudre  le  probleme  P  consiste  &  calculer  pour  tout  t  6  [0,  T1]  r<,  loi  de  probability 
conditionnelle  regulifere  de  Xt  sachant  7tY  sous  P,  appelee  “filtre”  ou  “filtre  normalise” 
a  I’instant  t.  Pour  tout  yiement  y  de  C$,  introduisons  la  fonction  a  valeurs  mesures 
{pT'v,  f  €  [0,7’]}  ddfini  par 


dP 

drf> 


vte[o,r]  v^€c6(Kd)  =  m[^xt)zt(y)]. 


D’apres  la  formule  de  Kallianpur-Striebel,  pour  tout  4>  dans  C&(IR“),  on  a  les  egalites  W 


p.s. 


&[Hxt)Zt  1  JtY] 

Mzt  I  Tty)  (pT’  ,1)' 


Le  processus  ,  t  €  [0,7’]}  est  appele  le  filtre  non  normalisd  (solution  de  l’equation  de 
Zakai)  et  la  formule  de  Kallianpur-Striebel  exprime  que  la  donnde  du  filtre  non  normalise 
permet  de  caracteriser  le  filtre  T  j.  On  introduit  ygalement  le  processus  JY  adapte  a  valeurs 
mesures  q(t,dz,s,x),  0<s<t<T,x€  IRd  qui,  pour  une  valeur  y  de  1’observation  est 
dyfini  par 


v<t>  e  Cb(3Rd) 


/ 


<i>(z)q(t,dzyS,x)  =  lE.,z[d>(Xt)Zt{y)]. 
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Le  processus  q{t,dz,0, x)  est  la  solution  fondamentale  de  liquation  de  ZakaT  pour  P  et  on 
a 

V^€  C^IR*)  (vT'  ,<t>)  =  / { J  <J>(z)q(t,dz,0,x)}po(x)dx 

=  J  <t>iz){  J  q{t,dz,0,x)p0(x)dx} 

Construisons  maintenant  la  famille  des  problemes  de  filtrage  approches.  Pour 

cela  on  se  donne  une  suite  {Tj?}n  de  subdivisions  de  [0,  T\  =  0  <  t"  <  . . .  <  tJJ  =  T\  on 
pose  pour  n  €  IN,  s  6  (0, 2”] 

|Trn|  =  max{|f"+,  -  t"|,  j  =  0,1,..., n  -  l}, 

=  {i  =  0,1,..., n,  t"  <  s},  n(s)  =  max/",  sn  =  t”(>) 

et  on  suppose  que  |T/|  — »  0  quand  n  -»  oo.  Pour  tout  n,  on  designe  par  bn  (resp.  crn,  /in) 
la  fonctionnelle  sur  [0,  T]  x  fl  ddfinie  par 

K 

v(*,«)  €  [o,r!  x  n  bn(s,u)  =  ^ii\(w(3B))fcfc («(*)), 

kz rl 
K 

hn(s,ui)  =  ]Plpt(w(sn))/ik(w(s)), 

*=i 

if 

an(s,w)  =  ^lPk(u;(3n))CT)t. 

*=i 

On  a  existence  et  unicite  trajectorielle  des  solutions  du  probleme  de  martingales  associe  a 
1’equation  differentielle  stochastique 

dXt  =  bn(t,X)dt  +  on(t,X)dWt  (2n) 

et  on  ddsigne  par  IP”  l’unique  loi  de  probability  sur  (fJ,/x)  solution  de  ce  probleme  avec 
la  loi  initiale  tt0.  On  note  IE"  1’espyrance  par  rapport  it  P"  et  IE",,  s  S  [0, 5TJ,  x  6  IR^ 
1’espdrance  par  rapport  4  la  loi  du  processus  solution  de  (2n)  avec  la  condition  X,  =  i. 
On  prend  pour  probleme  de  filtrage  Pn  le  probleme  avec  un  signal  {Xt,  t  6  [0,  T\)  continu 
a  valeurs  dans  IR/*  de  loi  F"  et  un  processus  observe  de  la  forme 

Yt=  [  hn(s,  X)ds  +  B",  t  €  [0, 7\ 

Jo 

avec  {B",  t  €  [0,  7"]}  mouvement  brownien  independant  du  signal.  Toujours  d’apres  la 
formule  de  Kallianpur-Striebel,  le  filtre  normalise  Tf  4  l’instant  T  pour  le  probleme  Pn 
est  caractdrisd  par  la  donn4e  du  filtre  non  normalise  {p%  ,v ,  v  €  O’}  donnd  par 

€  Cb(Ud)  (4n'V^)  =  E"  [*PW< »)]  ■ 
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Z,-(«.y)-«P {/?(«.»)-  ^  J  |/in(s,u>)|2d/} 

oil  pour  tout  lj  €  fl,  /"( u»,  •)  eat  W  indistingable  de  I'integrale  stochastique  /J  hn(s,  X)dY,. 
Enfin,  on  introduit  comne  ci-dessus  lea  proceasua  a  valeurs  mesures  qn(t,dz,  s,  i),  0  v  s  ■ 
t  <  T,  x  €  IR*  qui  pour  une  valeur  y  de  I'observation  sont  definis  par 

V*€  C»(R')  j  4>(z)qn(t,dz,s,x)  =  E*,[*(*,)Zf"(y)]. 

Pour  n  fixi,  le  problime  Pn  eat  tout  com me  le  probl&me  initial  P  un  problime  de  fil- 
trage  non  lin£aire;  il  n’est  paa  non  plus  de  dimension  finie  mais  il  poss^de  une  propriete 
int^ressante:  pour  tout  x  dans  IR1*,  pour  tout  j  =  0, 1, ....  n  -  1,  Pn  est  conditionnellement 
lineaire  sachant  Xt~  =  z  sur  1’intervalle  |t",  fj*+1  [.  D’autre  part,  on  a  4tabli  dans  ' 7 ■  la 

convergence  ^troite  de  la  suite  ,1'}B  vers  ,v  uniformement  sur  les  parties  compactes 
de  Cg.  Dans  ce  qui  suit,  on  va  utiliser  cette  propriety  des  Pn  et  le  resultat  de  convergence 
pour  calculer  une  approximation  de  la  solution  du  probleme  P.  On  note  desormais  nt(dz), 
^t(dz)  pour  (if 

2.  Presentation  d’un  algorithme  de  resolution  approchee  pour  P 

On  suppose  pour  simplifier  les  notations  que  {7/ }  est  la  suite  de  subdivisions  regu- 
lieres  de  [0,7’]  de  pas  6, ,  =  T/n.  Pour  k  =  on  introduit  1’equation  diflerentielle 

stochastique 

dXt  =  bk(Xt)dt  +  okdWt,  <6  [0,7’].  (3*) 

Pour  tout  z  dans  IR1*,  s  dans  [0,7"],  on  designe  par  1’esperance  par  rapport  a  la  loi 

du  processus  solution  de  (3*)  avec  la  condition  X,  =  x  et  on  definit  pour  une  valeur  fixee 
y  de  [’observation  les  processus  a  valeurs  mesures  qk  (t,  dz,  s,  z),  0  <  s  <  t  <  T  par 

V*€  Ck(IRd)  J  4>{z)qk(t,dz,s,x)  =  E*1.,,[*(*l)Zt(i)(-,y)] 

avec 

Zk{t){u,y)  =  exp|/fc,((w,y)  -  ^  J  ]h/t(A'.(u;))|Jds},  (w,y)  6  Cd  x  Cg 

ou  pour  tout  uj  €  f),  /*,»(<*’.•)  est  V  indistingable  de  I’integrale  stochastique  /0*  hk[X ,)dY,. 
On  fixe  n  €  IN  et  on  note  S  pour  6n\  on  a  alors,  pour  j  —  0, 1, —  1 

=  J  <7n((j  +  1  )6,dz,j6,x)^([dx) 

K  f 

=  9*  (O’  +  1  )6,dz,j6,x)n*f(dx) 

*=1  •/p‘ 
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ce  qui  nous  permet  de  calculer  pur  recurrence  une  approximation  I  i:  iic  ^  .11; 
en  procedant  de  la  fa^on  suivante.  On  poae  Q(Ct,dzj  -  polzldz  supposons  '  an  ui*-< 
l’approximation  Q(j6,dz)  de  p,  t(dz)  Alors  p(j .  t  ,t{dz)  s'approrhe  par 


QUJ  -  DM*) 


-±L* 

*=1 j 


((j  -  1  k,dr, j*. jlQjf.di 


La  mesure  Q(}6,  dz)  etant  suppose*  connue,  le  caicu!  de  QHj  -  1  it.  dzi  se  rarncin  au  -  ok  ... 
des  quantiles 

A*((j  -  1)£,  dz)  -  f  qk{[]  -  1)6,dz.jt.nQ‘  )f.dn. 

Jr> 

Or  Ak{(j  ~  ljrf.dz)  s'interprete  comme  le  filtre  non  normalise  a  I’lnstant  j  •  1  f  pi,-.:: 
le  probleme  de  filtrage  Pk  deduit  de  P  en  rempla<;arit  b.  r.  h  par  />„ .  a,  li,  aw<  ,a  10 
initiale  1  p,  Q{)F  dz)  a  I’instant  c’est-a-dire  comnie  un  filtre  iineaire  avec  r  and'.tmi. 
initiale  non  gaussienne  que  Ton  sait  implements  En  effet.  Makowsk:  a  obletni  dans  *> 
le  resultat  suivant:  la  loi  conditionnelle  a  un  instant  I  donne  pour  ur.  probleme  de  filtrage 
iineaire  avec  un  signal  de  loi  initiale  Qt.(da)  non  gaussienne  peut  se  cairuier  en  integrant 
par  rapport  a  Q0(dz)  un  noyau  qui  depend  des  sorties  a  1'instant  t  d'un  systerrie  re<  urs:‘ 
de  dimension  finie  auxiiiaire  construct  a  part  it  des  coefficients  du  probleme  de  filtrage  ;h 
dans  lequel  !a  loi  initiale  Q(,[dz)  n'intervient  pas;  On  appelie  S'  4*  ie  syslerne  auxiiiaire 
ainsi  associe  au  probleme  de  filtrage  Iineaire  r*  pour  k  1  h 

Detaillons  la  procedure  dans  le  cas  ou  le  signal  {  .Vt},  est  un  rnouvenierit  brown  i«r.  reel 
avec  une  loi  initiale  F(dz)  centree  de  densite  et  ie  processus  observe  de  la  forme 


"  *  I. 


X,  ds  -  Ht 


avec  {B t}t  mouvement  brownien  reel  independant  du  signal.  Introduisons  le  probleme  de 
filtrage  F  +  fresp.  P~)  avec  un  signal  {A’,},  brownien  reel  et  un  processus  observe  de  ia 

forme 


(  A ' ,di  -*-  8(  fresp.  f  A ' ,ds  +  B, ' 

Jo  x  Jo  1 


On  note  A~  (6,  dz)  (resp.  A~  (6,  dz))  le  filtre  non  normalise  a  1'inslant  6  pour  le  probleme 
F  +  (resp.  P~)  avec  la  loi  initiale  de  density  1R» {x)p,,{j)di  (resp.  1  „(  [x)pi\j)di) .  On 
approche  p ([dz)  par 

Q(6,dz)  =  A*  (6,  dz)  4  A  (t.dz) 

puis  on  r^itere  la  procedure  ci-dessus  entre  I’instant  6  et  I’instant  26  avec  F(dz)  -  Q(6,  dz) 
comme  nouvelle  loi  initiale  et  ainsi  de  suite  jusqu’i  1'instant  final  T\  la  mesure  Q[j6,dz) 
etant  1’approximation  de  p,i(dz)  ainsi  calcul^e,  P(j  +  \)e[dz)  s’approche  par 

Q(U  +  I)*,**)  =  *+(0  +  l)(,dz)  +  A"  ((j  f  1  )6,dz) 


i>u  A'  l'J  -  l)i.d*l  (resp.  A  ((7  t  1  )t,dz))  est  If  filtre  non  normalise  a  l'instant  (7  4-  1) <5 
pour  le  probleme  P*  (resp  p~)  avec  la  loi  initiaie  a  l'instant  jb 

F~  (dz)  *  [z)Q{Ji,it)  (resp.  F~  (dz)  =  Ijr- (tr)Q(7«, dz)) 

et  ies  res ul tats  d*  Makowski  161  fournissent  la  valeur  en  tout  point  de  ]R  des  densites  de 
es  filtres  Plus  precisement,  soit  {?,*}«,  { Ci~ } t  les  solutions  des  equations  differentielles 
stochastiques 

<t‘c 

d;,*  =  ~[R{t)  -  1)  dV,  -  (-;**)(,  =  0, 

} '  :  R't  j  etant  solutions  des  equations  differentielles  ordinaires  de  type  Riccati 

ti  p\ 

-1  =  -  Pit)2  -  1,  P(o)  =  o, 

at 

dR‘i]  =  P[t)(l  -  R(t)),  R( 0)  =  0. 

at 

en  dflir.issant  .'(/)  par 


/  *  r  2i 

.-it)  =  j  ,1  -  (*(.«)  -  1)  j 


ds, 


pour  tout  r  darts  K,  la  densite  de  .4~((j  ~  l)6.d.r)  au  point  r  secrit 


:  ,  ,  f  1  lr 

g--;,r)-  /  rp| - 2>(o 

-  *,Lt  -  1)^^  +  +  i)<5)|. 


-*((/+!)*)*)' 

iW 


^(ds) 


(4) 


iln  pratique,  les  g'~  ( r )  sont  cakuies  sur  une  grille  symetrique  par  rapport  au  point  0  et  les 
ale  uls  d  'integrales  se  font  par  linearisation  des  integrands  sur  la  grille.  Entre  deux  instants 
ie  discretisation,  1'algorithme  n’est  pas  difficile  a  implementer:  il  suffit  de  faire  courir  les 
"ystemes  auxiliairea  SA±,  k  =r  1  qui  sont  recursifs  de  dimension  finie  (ce  qui  prend 

peu  de  temps  et  de  place);  en  revanche,  a  chaque  instant  multiple  de  6,  il  faut  calculer 
les  valeurs  de  la  nouvelle  densite  en  chaque  point  de  la  grille,  done  un  tres  grand  nombre 
d'integrales,  ce  qui  rend  1’algorithrne  tres  lourd,  deja  dans  1’excmple  ci-dessus  pourtant  le 
plus  simple  possible. 

On  a  traite  cet  exemple,  avec  une  loi  initiaie  F{dz)  gaussienne  centrce  de  "iance  vo 
et  un  bruit  d’observation  de  variance  n,  sur  ordinateur  Multics.  On  a  obtenu  les  temps 
de  cakul  suivants  pour  la  density  conditionnelle  a  l’instant  T  —  I  correspondant  a  une 
simulation  du  signal  et  du  processus  observe: 
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(1)  2  minutes  46  sec.  pour  un  pas  de  discretisation  i  =  0.1.  les  integrities  etant  <  an 
sur  une  grille  de  maille  0.1  entre  les  points  -6  et  6; 

(2)  20  minutes  27  sec.  pour  S  =  0.01  avec  la  meme  grille; 

(3)  155  minutes  38  sec.  pour  6  =  0.005  avec  une  grille  deux  fois  plus  fine 
En  comparant  les  filtres  obtenus  dans  ces  3  cas,  il  apparait  que  les  resultats  du  r as  1 
sont  paxfois  assez  m&uv&is.  II  faut  done  compter  en  fait  un  minimum  d  une  vingtaine 
de  minutes  de  temps  de  calcul  (pour  chaque  trajectoire  observee)  pour  avoir  un  results', 
fiable,  et  il  est  tiks  net  que  ce  temps  de  calcul  augments  considerablement  avec  la  finesse 
du  pas  de  discretisation  et  de  la  maille  de  la  grille  d'integration  On  v,.jt  done  bien  que 
pour  utiliser  cet  algorithme  en  dimension  plus  grande  que  1,  i!  faudrait  dans  un  premier 
temps  proposer  des  methodes  d 'approximation  pour  le  calcul  des  integrals*  de  t\  ;>e  t 
Toujours  sur  cet  exemple,  on  a  pu  remarquer  un  comportement  du  filtre  auque!  on  pouvai: 
s’attendre  intuitivement:  au  cours  du  temps,  les  densites  conditionnelles,  qui  demeurent 
symetriques  par  rapport  a  0,  restent  unimodales  dans  certains  cas  et  component  deux,  pi  s 
lorsque  le  signal  s’est  suffisamment  eloigne  de  0.  On  a  egalement  fait  varicr  .a  var.arue 
du  bruit  d’observation  et  constat6  que  1’apparition  de  deux  pics  est  plus  frequents  lorsque 
cette  variance  est  petite. 
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Abstract 

In  [l],  th«  EM  algorithm  has  been  investigated  in  the  context  of  partially  observed 
continuous-time  stochastic  processes. 

The  purpose  of  this  paper  is  to  compare  this  approach  with  the  direct  maximisation  of 
the  livelihood  ratio,  in  the  particular  case  of  diffusion  processes.  This  yields  to  a  comparison 
of  nonlinear  smoothing  and  nonlinear  filtering  for  the  computation  of  a  certain  class  of 
conditional  expectations,  relevant  to  the  problem  of  estimation  (Section  3).  In  particular,  this 
explains  why  smoothing  is  indeed  necessary  for  the  EM  algorithm  approach  to  be  efficient. 


1  Introduction:  the  EM  algorithm 

The  EM  algorithm  in  an  iterative  method  for  maximizing  a  likelihood  ratio,  in  a  situation  of 
partial  observation  [2j.  Indeed,  let  (P#  ;  fi  €  6)  be  a  family  of  mutually  absolutely  continuous 
probabilities  on  a  space  (fl,  7),  and  let  1/  C  7  be  the  o-algebra  representing  all  the  available 
information.  Then,  the  log-likelihood  ratio  can  be  defined  as: 

where  a  is  fixed  in  0,  and  the  MLE  (maximum  likelihood  estimate)  as: 

fi  6  arg  max  L(9) 
see  ' 

The  EM  algorithm  is  based  on  the  following  direct  application  of  Jensen’s  inequality: 

~  L(9‘)  =  logE#'  |  y)  >  E»<  (log  |  ]/)  =  Q(9,6')  (1) 

which  gives,  for  each  value  9'  of  the  parameter,  a  minoration  of  the  log-likelihood  function 
fi  — •  L(9)  by  means  of  an  auxiliary  function  fi  — ■  L[9')  4-  Q{9,9'),  with  equality  at  9  =  9'. 

The  way  the  EM  algorithm  works  is  described  by  the  flow  chart  jiven  in  Fig.  2,  whereas 
Fig.  1  shows  a  sample  few  steps  of  the  algorithm. 
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Figure  2:  Algorithm  Row  chart  * 
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An  interesting  feature  of  the  algorithm  is  that  it  generates  a  maximizing  sequence  {Bp  ,  p  = 
0,1,---}  in  the  sense  that:  L(Bp+l)  >  L(0P) .  Some  general  convergence  results  about  the 
sequences  {L(8p)  ;  p  =  0, 1,  •  •  •}  or  { 9P  ;  p  =  0, 1,  •  ■  are  proved  in  [10),  under  mild  regularity 
assumptions  on  £.(•)  and  Q(-,-). 

For  this  algorithm  to  be  interesting  from  a  computational  point  of  view,  the  following  two 
features  should  be  found: 

(E)  computing  the  auxiliary  function  Q(-,9')  should  not  be  much  more  complicated  than  com¬ 
puting  the  original  log-likelihood  ratio  £(•), 

(M)  maximizing  the  auxiliary  function  Q(-,8')  should  be  quite  simpler  than  maximizing  the 
original  log-likelihood  ratio  L(-). 

The  latter  will  occur  if  Q(8,  8')  -  as  could  be  expected  from  the  definition  (l)  -  can  be 
explicitely  computed  by  means  of  a  (generally  infinite-dimensional)  density  depending  only  on 
9',  acting  on  various  simple  functions  depending  on  both  9  and  8'.  If  this  is  the  case,  computing 
Q(8,0')  or  the  gradient  V10Q{9,9')  with  respect  to  8,  for  different  values  of  the  parameter  9  (8* 
being  fixed),  will  not  involve  the  computation  of  any  other  infinite-dimensional  object. 

To  prove  the  existence  of  smooth  enough  -  in  the  a.s.  sense  -  versions  of  8  w *  L(9)  and 
(8,8')  >-*  Q(8,B'),  as  well  as  to  get  the  expression  of  the  corresponding  derivatives,  one  can  rely 
on  the  following  extension  of  Kolmogorov’s  lemma,  and  the  next  remark: 

Proposition  1.1  [9,  Lemma  lj 

Let  (fl,  7 ,  P)  be  a  probability  space  and  (A(9)  ;  8  6  0),  with  0  C  Rr,  such  that: 

9  »  A(8)  is  of  class  Ct,a  (i.e.  k-times  continuously  differentiable  with  its  k-th 

derivative  Holder- continuous  of  order  0  <  a  <  1  j  from  0  to  L'(n,7,P) 

Then  there  exists  a  random  function  (fl,w)  >-*  A(8,w)  such  that: 

•  Vw  6  0  ;  8  i—*  A(8,w)  is  of  class  C’  provided  j  +  -  <  k  +  a 

•  V8  6  0  ;  A(8,-)  is  7 -measurable,  and  the  a.s.  derivatives  ofA(B,-)  (up  to  order  j)  are  a.s. 
equal  to  the  corresponding  Lr -derivatives  of  .4(8) 

Remark:  Let  y  C  7  be  a  sub  o- algebra.  To  prove  the  existence  of  an  a.s.  smooth  version  of 
8  f*  B{9 )  with  B(8)  =  E(A(8)  |  y),  it  is  enough  to  check  that  8  — ♦  A(9)  satisfies  the  assumptions 
of  the  previous  Proposition.  Moreover  the  a.s.  derivatives  of  (the  smooth  version  of)  B(B)  will 
be  a.s.  equal  to  the  conditional  expectations  with  respect  to  y  of  the  corresponding  derivatives 
of  (the  smooth  version  of)  A(B) . 


The  EM  algorithm  has  been  applied  in  the  context  of  continuous-time  stochastic  processes 
in  [lj  where,  in  the  case  of  diffusion  processes  [l,  Section  3],  the  general  expression  of  Q(8,8') 
has  been  derived  [1,  (3.1)|  and  said  to  involve  a  nonlinear  smoothing  problem.  The  authors  have 
also  considered  some  particular  cases  in  order  to  get  more  tractable  results,  as  well  as  other 
situations  including  finite-state  Markov  processess  and  linear  systems  [1,  Sections  4-5). 

The  purpose  of  this  paper  is  to  get  back  to  the  general  problem  for  diffusion  processes  and 
address  the  following  three  points: 
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•  clarify  the  expression  [1,  (3.4)1  giving  Q(9 ,9')  in  terms  of  a  nonlinear  smoothing  problem, 

•  get  an  equivalent  expression  for  Q(9,  O')  and  its  gradient  V1  °Q(9, 9'),  in  terms  of  a  nonlinear 
filtering  problem  (it  will  turn  out  that  smoothing  is  indeed  necessary  for  the  point  [M] 
introduced  above  to  be  satisfied,  although  filtering  is  enough  to  compute  Q(9,9‘)  for  a 
given  value  of  (9,9')), 

•  get  similar  expressions  for  the  original  log-likelihood  ratio  L(9)  and  its  gradient  V L(9). 

This  will  allow  to  compare,  from  a  computational  point  of  view,  the  two  possible  approaches 
for  maximum  likelihood  estimation: 

•  direct  maximization  of  the  likelihood  ratio, 

•  the  EM  algorithm. 

Finally,  it  should  be  mentionned  that  the  scope  of  this  paper  is  limited  to  “exact”  formulas, 
in  terms  of  stochastic  PDE’s  (or  their  discretized  approximations). 


2  Statistical  framework 

In  this  section,  expressions  for  the  log-likelihood  ratio  L(-)  and  the  auxiliary  function  Q(-,  ■)  will 
be  derived  in  the  following  context  (see  [1,  Section  3]). 

Hypotheses: 

Let  B  €  9  C  Rp  denote  the  unknown  parameter.  Assume: 

•  (Po(‘)  ;  9  g  &)  are  mutually  absolutely  continuous  densities  on  Rm, 

•  6j(-)  is  a  measurable  and  bounded  function  from  Rm  to  Rm, 

•  <r(-)  is  a  continuous  and  bounded  function  on  Rm  such  that  a(-)  =  <r(-)<7*(-)  is  a  uniformly 

m  q 

strictly  elliptic  mx  m  matrix,  i.e.  a(-)  >  al,  and  ^  tr — a’;(-)  is  a  measurable  and  bounded 

i=i  dx' 

function  from  Rm  to  Rm,  for:  j  =  1, .  . . ,  m, 

•  ht(-)  is  a  measurable  and  bounded  function  from  Rm  to  R*. 

Additional  hypotheses  concerning  the  regularity  with  respect  to  the  parameter  9  will  be 
needed  later  on. 

Suppose  then  that  a  family  (Pi  ;  f  60)  of  probabilities  is  given  on  a  space  (Cl,  "),  together 
with  a  pair  of  stochastic  processes  (X,  ;  0  <  t  <  T)  and  (Yt  ;  0  <  t  <  T)  taking  values  in  Rm 
and  R*  respectively,  such  that  under  Pf : 

dXt  =  b,(X,)  dt  +  <r(X,)  dlV*  X0  ~  pS(  ) 

dy,  =  h,(x()  dt  +  dw* 
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where  (Wf  ;  0  <  t  <  T)  end  (W,  ;  0  <  £  <  T)  are  independent  Wiener  processes,  and  the 
initial  condition  Xo  is  a  r.v.  independent  of  both.  Then  (Pi  ;  9  6  0)  are  mutually  absolutely 
continuous  probabilities  on  (fl,  7)  with: 

.  a  dP, 

A"*  =  IF* 

=  exp  J  f*{a-\X.)(b,{X.)  -  b,.(X.))Ya[X.)  AWf 

f*(b'(X.)  -  b,.(X.)Ya-l(X,)(b,(X.)  -  b,.(X,))  ds  j  (2) 

exp  |^r(h4(X.)  -  h,.(X.))'  AY.  -  \  j\h;(X.)h,(X.)  -  h;.(X,)h,.(  X.))  ds  J 

o 

Consider  also  the  probability  Pi  defined  by: 

z*  *  J  =  exp  { 1°  h'l[X,)  dY‘  ~  5  C  Wx‘'>h‘(x‘) 


so  that,  under  Pi- 


dX,  =  bt(X,)  dt  +  a (Xt)  dW* 


X o  ~  pg(-) 


and  (V»  ;  0  <  t  <  T)  is  a  Wiener  processes  independent  of  (W*  ;  0  <  t  <  T),  and  the  r.v.  Xo  is 
again  independent  of  both.  Aj(-  can  then  be  decomposed  as: 


A »>•  =  Uts'  -gf 


with:  Uee'  ==  - ~- 


It  is  assumed  that  only  (Yt  ;  0  <  t  <  T)  is  observed,  and  let  (l/j  ;  0  <  t  <  T)  denote  the 
associated  filtration.  Then  the  likelihood  ratio  for  the  estimation  of  the  parameter  6  can  be 
expressed  as: 


E.  ^  I  yr'j  =E*  (z’U,a  |  yT) 


where  a  is  fixed  in  0.  By  Bayes  formula: 

Ea  (z’U,a  |  yT)  =E I  {z*  I  Jfr)  X  Eo  (U,a  |  l h)  =Ei  {z*  |  X/T) 

o 

since  Uia  is  independent  of  l/r  under  Pa  ■ 

This  gives  the  following  two  expressions  for  the  log-likelihood  ratio  L(-): 

m  =  log  Ea  [Z*U, a  I  yT) 

=  log  E#  (ze  |  yT) 


For  the  auxiliary  function  Q(-,  )  defined  by  (1),  one  has  immediately: 
Q(M')  =  E<’  (log  A(<<  |  y) 


(5) 


e«*  | 

{** 

log  A4i>  I 

Vt) 

e4.  | 

[z* !  Vt) 

1 

Ea  (z*  U,<„  log  A*#<  |  1 /r) 

Ea  (z'U,,*  |  yT) 


(6) 

(7) 


Remark:  Formulas  (4)  and  (6)  will  be  used  to  compute  the  log-likelihood  ratio  and  the  auxiliary 
function  respectively  by  means  of  a  nonlinear  filtering  problem,  formula  (5)  directly  allow  to 
compute  the  auxiliary  function  by  means  of  a  nonlinear  smoothing  problem,  whereas  formulas 
(3)  and  (7)  should  be  used  to  prove  the  existence  of  smooth  versions  and  get  the  expression  of 
the  corresponding  derivatives. 

Indeed,  under  additional  regularity  asumptions,  it  is  easy  to  prove,  using  Proposition  1.1, 
that  both  9  ►-*  L(0)  and  9  — *  Q(9,9' )  have  a.s.  differentiable  versions,  with  gradients  given  by: 


yr) 

(8) 

e«  {z*  |  yT) 

v10Q(9,e')  =  e #.(p*  I  yT)  =  % 

Wt) 

(9) 

E*  (Ze  I 

Vt) 

respectively,  where  (V  denoting  derivation  with  respect  to  the  parameter  B): 

P‘  =  ^g)+/oT(a-1(XJ)V6,(X,))V(X,)dW.s 

(10) 

+  [  (Vh»(X.)Y(dY.-h,(X<)ds) 

Jo 


Remark:  One  can  check  from  (8)  and  (9)  that: 

V10Q(M')  !«=*•=  VL(ff') 

as  expected. 

The  next  section  will  be  devoted  to  give  different  ways,  by  means  of  SPDE  mainly,  to  compute 
the  various  quantities  introduced  so  far:  L[0),  V L(0) ,  Q (9 ,  9')  and  Vlo<3(0,0').  This  will  make 
possible  the  numerical  implementation  of  algorithms  for  the  maximization  of  the  likelihood  ratio. 


3  Smoothing  vs.  filtering  for  the  computation  of  a  certain  class 
of  conditional  expectations 


For  the  sake  of  simplicity,  any  reference  to  the  parameter  9  will  be  dropped  throughout  this 
section.  In  particular,  P  will  denote  the  probability  under  which: 

dX,  =  b(X,)  dt  +  a{X,)  dW,  X0  ~  po(  ) 


dY,  =  h(X,)  dt  +  dW, 
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where  ( Wt  ;  0  <  t  <  T)  and  (W ,  ;  0  <  t  <  T)  ar*  independent  Wiener  processes .  whereas  under 

o 

P: 

dXt  =  6(A'i)  dt  +  <r(.V,j  dW,  X0  ~  po(  •) 

and  (Yt  ;  0  <  (  <  T)  ia  a  Wiener  processes  independent  of  (IV,  ,  0  <  l  <  7)  Define  also  the 
process  ( Zt  ;  0  <  t  <  T)  by: 

Z,  =  exp  |  f  h'(X.)  dY.  -  l-  j'  h-(X,)h(X.)  ds  J 

The  purpose  of  this  section  is  to  provide  two  different  ways  -  one  based  on  nonlinear  smooth¬ 
ing,  the  other  on  nonlinear  filtering  -  for  the  computation  of  the  following  cl^ss  of  conditional 
expectations: 

A  =  E  ^(X0)  +  jT  ax.)  ds  +  j\-(X,)dY,  +  j\’(X,)o(X,)dW,  _-Tj  (11) 

where  fj,  £,  r/  and  x  are  measurable  and  bounded  functions  from  R"1  to  R,  R,  Rf  and  Rm 
respectively.  It  is  readily  seen  that  the  computation  of  either  VL(8),  Q(6.9')  or  V1  r Q!J  $' i 
involves  such  conditional  expectations 

It  is  clear  from  the  definition  that  A  depends  linearly  on  (d.sih.v)-  It  will  turn  out  that 
nonlinear  smoothing  is  the  only  way  to  make  this  dependence  explicit,  although  nonlinear  filter¬ 
ing  -  which  is  simpler  -  is  enough  to  just  compute  .4.  The  following  facts  and  notations  about 
nonlinear  filtering  and  smoothing  equations  are  gathered  here,  and  will  be  extensively  used  in 
the  sequel: 


Notations: 

•  Filtering 

*t  (iesp.  u()  will  always  denote  the  unnormalized  (resp  normalized)  conditional  density  of 
the  r.v.  Xt  given  J/(,  i.e.: 

(»«.*)  =  E(*(*«)i  yt) 


(u{,0)  —  e  (<f>(At)Zt !  ) 


where  0  is  a  test-function.  By  Bayes  formula: 


t,<t>)  = 


(U|» 


(I-) 


(13) 


(«<.  1) 

The  equation  for  (u,  ;  0  <  t  <  T)  is  ZakaV  equation  |4j: 

du,  =  u,  dt  +  h* u,  dY,  u0  =  po  (14) 

where  L’  denotes  the  adjoint  operator  of  the  generator  of  the  diffusion  process  (A',  ;  0  <  t  <  T), 
i.e.: 

A  1  A  ,  d 2  ,  d 


t,;  - 1  1  »  =  1 


Smoothing  (fixed-interval) 


>  ®  denote  the  fixed  end  time  t.  (reap  yt )  will  always  -ienutr  the  un  n  i  <r  rn  a.  i  re*: 
(reap  normalized)  condition*!  density  of  the  r  v  V,  given  wt 

!'*i  ♦)  El*«  V, ,  rff } 

( )  E  i.  *m  V  (  Zj  *■*  i 


Again 

*  *.  *■ 

Introducing  the  backward  Zakai  equation 

•it  f  -  Jt  .  >i(  •  h'r,  iV. 

one  has  4 


i  7,  oi 

sfi  i : 


er.. .1  ? 

*  '  i  r  r ,  ■  :  e  * !  1 1 


Let  |*  0  •  $  ■  !  reap  .  '  « 

stochastic  semi- group  t** *,tr.  ;  f.  i 

definition  of  slochastu  *err..  gro-.p*.  !  .-•.  .*  u. 


•  i:  .  *  a  : 


at  ^ 


The  next  results  are  yr  n  *j 


w;th  ?he  different; 


This  gives  a  oupie  >{  :•>  f  r 


Mi  reover.  it  fo.lows  *r 


From  the  computa;  r..t.  ;  , 
store  t he  value  >(  'hr  rrr. 

'he  backward  ^ia:,  r.  •'  r  /, 
"apai'ity  of  nirrm-rv  it  -  igr 


x;  a  •*. 


it  <c 


The  most  direct  »j;.roa<h  *. 
nonlinear  smoothing  tr  ;  a  vs  :  t  ; 


so  that,  by  Bayes  formula: 


A  =  E(pr  |  J/t)  = 


E  (pt%t  I  1/t) 

e  (zT  |  yT) 


The  idea  is  to  find  an  equation  for  (wt  ;  0  <  t  <T)  defined  by: 


(<t>(x,)Plzt  |  yt) 


By  Ito’s  lemma: 

d{4>(xt)Ptzt]  =  P,ZtU{x,)  dt  +  p,zt(D<i>(xt)y<T(xt)  dw, 


+<HXt)ZtZ(X,)  dt  +  <t>(X,)Z„ i'(Xt)  dY,  +  4>(X,)Ztx'(Xt)<T(Xt)  dWt 


+4>{Xt)p,h‘(Xt)Z,  dYt  +  4>{X,)V'(X,)h(Xt)Z,  dt  +  Z,{D4>{Xt)ya(X ,)x{Xt)  dt 
Using  known  properties  of  conditional  expectation  given  the  observation  under  the  reference 

O 

probability  P,  and  the  definition  (12),  one  gets. 

=  [po,P4>)  +  f  (<",,  ^<t>)  da  +  [  [w„h'4>)dY, 

Jo  Jo 

+  f  (u„  £)  ds  +  f  (u,,  rj")  dY,  +  [  (u„  r)'h<f>)  ds  +  f  (u„  J(x)<f>)  ds 
Jo  Jo  Jo  Jo 

where: 

=  x‘aD<t> 

The  equation  satisfied  (at  least  in  a  weak  sense)  by  (u>(  ;  0  <  t  <  T)  is  therefore: 


dwt  =  dt  +  h'vut  dYt  +  ({  +  <l‘ h)u,  dt  +  r/'u,  dYt  +  J'(x)ut  dt 


w0  =  Ppo 


(27) 


With  the  notations  introduced  above,  one  has: 


A  = 


(wr,  1) 
(ur>  1) 


(28) 


This  expression  is  obviously  simpler,  and  cheaper  to  compute,  than  the  corresponding  equation 
(26)  obtained  by  smoothing.  Unfortunately,  the  linear  dependence  of  (wr,l)  on  ($,  $>h>x)  is 
not  made  explicit,  which  should  be  the  case  for  the  point  [M]  to  be  satisfied.  Therefore,  the 
next  step  will  be  to  make  this  dependence  more  explicit  Basically,  one  will  recover  the  solution 
based  on  smoothing,  so  that  there  seems  to  be  little  gain  overall.  However,  there  will  be  some 
benefit: 


•  the  stochastic  integral  in  (23)  will  be  given  a  rigorous  meaning, 

•  the  last  term  in  (23)  will  also  be  given  a  computable  expression,  whether  or  not  assumption 
(24)  is  satisfied. 
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* 

f: 

K 


11 


which  is  exactly  the  second  term  in  (26). 

•  Study  of  >4's'(x) 

A<’>(X)  = 

(“T. 1) 

where: 

du>((3)  =  Cw[3)  dt  +  A*U)’3)  dY,  +  J’(x)u,  dt 
This  gives  successively,  using  again  (18): 


4*’  =  0 


From  the  identity: 


«{S)  =  [tU?lJ'(x)*,)d> 

Jo 

Kls).*)  =  (u;\j'{x)u,\,<t>)d3 

Jo 

=  /V*(xK.*7*)<fa  =  f'(u.,.’(x){Vt’t])d, 

Jo  Jo 

(w5-3),l)=  f  (u,,J(x)v,)da 
Jo 

=  f  (u,,  x'oDti.)  ds  =  /"  (?.,  x‘u  — - )  da 
Jo  Jo  v, 

u,Dv,  =  (?,,  1)  it,  D  j^j 


one  finally  gets,  using  again  (22): 


/  (u ,,x'aDv,)da  T 

A(3,(x)=^  (-;n -  =  /0  (x*«D[^],x,)d, 


The  link  with  the  partial  result  of  Lemma  3  1  is  given  by  the  following: 

Lemma  3.2 

Under  assumption  (2f),  expression  (59)  particularizes  to: 


A(3|(x)  =  (5fr.  F)-  (5fo.f)-  r^.-^nds 

Jo 


which  is  exactly  (25). 


Proof: 


Under  (24): 


(u,,x‘aDu,)  -  (u,,  (DF)'aDv,) 


£(Fv.)  =  FCv.  +  v.CF  +  (DF)'aDv, 
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Therefore,  using  in  particular  (20): 


(u,,x'o£>v,)  =  (u,,£(Fv,))  -  (u,,FUv,)  -  (u,,v,£F) 


=  [v,£'u,  -  u.Lv,,  F)  -  ( u,v ,,  £F) 


This  gives  successively,  using  (22)  again: 

f T(u„x'aDv,)d s  =  (qr,F)-(qo.F)-  ffa.CF)* 
Jo  Jo 

-4(3>(x)  =  (*r,  F)  -  (*o,  F)  -  [T(x„  CF)  ds 

Jo 

•  Study  of 


Q 


A,2,(n)  = 


(UT\  1) 


where. 

dw\2^  =  £’wj2^  dt  +  h'ujj2^  dY,  +  ij'uj  dY,  +  rj'hu,  dt  w ^  =  0 

The  “variation  of  constant”  argument  which  was  used  for  the  three  previous  terms,  does  not 
hold  here,  at  least  in  the  continuous-time  case.  Consider  instead  the  following  partition  of  :0,  T 
0  =  to  <  *1  <  ■  •  •  <  1/V  =  T,  and  the  corresponding  approximation  to  (wj2^  ;  0  <  t  <  T): 


kn  =  +  r?‘hu,nAl 


This  gives  successsively,  using  again  (18): 


*<j)  =  ztir*, 

1=0 


i  =0 

n-  1 


1=0 


1=0 


(“'Iv  .1)  =  v‘,.,)Ay'n  +  ]C  %*,  )^‘ 

1=0  i=0 

Taking  then  the  limit  of  both  aides  as  the  mesh  of  the  partition  goes  to  0,  gives: 

(uij?\l)  =  f  (t/'u  .,v,)dY.+  [  (rj’hu,,  v,)  ds 
Jo  Jo 


1.1 


where  the  stochastic  integral  is  to  be  understood  aa  a  two-sided  stochastic  integral  [6|,|7;  Finally, 
using  again  (22) 

f  (v..n')dY,  ,T 

A  M  =  9  (uT;i)~~ +  L  {9-'r>'h)da 

Remarks: 

•  Whether  or  not  the  first  term  can  be  further  simplified  should  be  investigated,  but  this 
would  definitely  be  out  of  the  scope  of  this  paper 

•  As  expected: 


("T.  1)  J  0 .  ) 

the  last  equality  resulting  from  the  definition  of  two-sided  stochastic  integrals 


3.4  Conclusion 

Two  methods  have  been  proposed  for  the  computation  of  conditional  expectations  such  as  (11) 
•  Filtering  gives. 


A  = 


(w-r,  1) 

(“T,l) 


where  (u(  ;  0  <  t  <  T)  and  (u>t  ,  0  <  t  <  T)  are  solution  to  (14)  and  (27)  respectively 
•  Smoothing  gives  either: 

( WT i  1 )  =  (go,£)+  f  (<7».s  +  rimh)ds+  f  (q,tn')dY,-r  f  (x’aD  f— 

Jo  Jo  Jo  lu,J 


,  u,)  ds 


A  =  (”o<9)+  f  (*,,  f  +  v'h)  + 
Jo 


fT ili'V')  JY, 

Jo _  .. 


(«r. l) 


fix 

Jo 


aD 


,  n,)  ds 


where:  (?«  ;  0  <  <  <  T),  (ir,  ;  0  <  t  <  T)  and  (*i  ;  0  <  t  <  T)  are  given  by  (13), 
(14), (15), (16), (19). 

The  advantage  of  smoothing  over  filtering  is  that  the  dependence  on  (/?,£,  »),x)  's  made 
explicit,  provided  the  underlying  probability  does  not  change,  evaluating  A  for  a  different  set 
of  data  (j9,  J,r),x)  will  not  require  the  computation  of  a  new  infinite-dimensional  object  In  the 
filtering  approach,  one  would  have  to  solve  another  SPDE,  with  a  different  “right-hand  side”. 

On  the  other  hand,  from  the  computational  point  of  view,  solving  equation  for  the  smoothing 
density  requires  the  storage  of  the  filtering  density,  and  is  therefore  more  expensive. 

The  next  two  sections  will  be  devoted  to  the  application  of  these  two  approaches  to  the 
computation  of  quantities  related  to  the  direct  likelihood  maximization,  and  to  the  EM  algorithm 
respectively.  , 


4  Direct  maximization  of  the  likelihood  ratio 


According  to  (4)  and  (12),  the  log-likelihood  ratio  L(9)  is  given  by  any  of  the  following  expres¬ 
sions: 

1  lTl.>  u*\ 

/o 

with  (see  (14)): 


m  =  108(4,1)  =  j\*\,h;)dY.  -  \  jfV:. «)(»;.  m* 


du?  =  C, uf  dt  +  AJuf  dY, 


“o  —  Pa 


(30) 


and: 


ft  —  ^  I  j  i  \  ^  LI  /  \  ^ 

H5^  +  SU)55 


According  to  (8)  and  (10),  V L(9)  belongs  to  the  class  of  conditional  expectations  considered 
in  Section  3,  provided: 

•  the  underlying  probability  is  Pi, 

•  the  following  data  are  used: 


A 

Po 

rjs  = 


X<  =  a ~'V6, 


In  particular: 


f#  + =  o 


The  approach  based  on  filtering  gives: 


=  K.i) 

K.i) 


with  (u(  ;  0  <  t  <  T)  and  (?<  I  0  <  t  <  T)  given  respectively  by  (30)  and  (see  (27)): 

dwf  =  £;<n'  dt  +  h}wf  dY,  +  (Vfi,)*uf  dY,  +  JJu f  dt 


“o  =  vPo 


(31) 


where: 


Remarks: 


j't  =  J(x»)4>  =  (Vbiy  D* 


This  equation  is  exactly  what  would  be  obtained  by  deriving  formally  equation  (30),  with 
respect  to  the  parameter  0.  This  result  was  indeed  obtained'in  |3),  relying  on  the  existence 
of  a  “robust”  version  of  Zakai  equation. 
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-- 


•  If  #  is  a  p-dimensional  parameter,  then  the  gradient  (in*  ;  0  <  t  <  T)  is  a  p-dimensional 
vector,  each  component  of  this  vector  actually  solves  a  SPDE  which  is  coupled  only  with 
(“t  I  0  <  (  <  T)  and  with  no  other  component;  moreover  the  coupling  occurs  only  through 
the  right-hand  side”  and  each  of  these  (p  +  1)  SPDE  has  the  same  dynamics.  In  other 
words,  one  has  to  solve  the  same  SPDE  with  (p  +  1)  different  “right-hand  side" .  As 
expected,  smoothing  will  provide  a  more  efficient  way  to  deal  with  such  a  problem. 

Indeed: 


V£,(5)  =  (§,Vp°)  +  /or((yM’,9.')dy.  +  £((V6,)-  d 


X)  da 


where  (uf  ;  0  <  t  <  T)  and  (gf  ;  0  <  t  <  T)  are  given  respectively  by  (30)  and  (see  (21)): 

««+“»■£#  ?f  =  “r  (32) 


5  The  EM  algorithm 

According  to  (5)  and  (2),  the  auxiliary  function  Q(M')  belongs  to  the  class  of  conditional 
expectations  considered  in  Section  3,  provided: 

•  the  underlying  probability  is  Pt>, 

•  the  following  data  are  used: 

fitt>  =  log  ^ 

Po 

=  ~2  [(fc*  -  M M  +  [h’,h,  -  h;,hs,)} 

Vss1  ~  ht  -  hr 


Xee1  =  -  br) 

In  particular: 

+  Vtrhf  =  --  [(6#  -  6s')*a_1(6s  -  br)  +  [h,  -  hr)’[he  -  V)] 

The  approach  based  on  filtering  gives: 

with  (u,  ;  0  <  t  <  T)  and  (inf*  ;  0  <  t  <  T)  given  respectively  by  (30)  and  (see  (27)): 
dwf1  -  f.’rWt1  dt  +  hi-wf1  dY,  +  (hf  -  hrj'uf  dY,  +  dt 

~  2  [(**  _  **')"a  ’(b«  ~  br)  +  [ht  -  hii)'{hi  -  h#.)J  uf  dt 

-r  =  Ps'iog4 

Po 
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where: 


=  J(xtf)<tl  =  (6#  -  (>*')" D(f> 

On  the  other  hand,  smoothing  gives: 


D>  (T^Ah,~ht.y)dY. 

Q{6,9‘)  =  (*'  ,log  +  ^2 - -j— - +  / 

Po  (UT  *  l)  ■'O 


((4* -M'S 


,**.')  ds 


-i  /or(^',  [(&«  -  b,.)’a-l(b,  -  b,,)  +  (hg  -  h,,)’{h,  -  V)])  d3 


(33) 


where: 


Jj. 

‘  K,i) 


*r  = 


1  (?r,i) 

(uj*  ;  0  <  t  <  T)  and  (qf  ;  0  <  t  <  T)  are  given  respectively  by  (30)  and  (32). 


Remark:  It  is  readily  seen  from  the  last  expression  that  the  point  [M]  defined  in  the  Introduc¬ 
tion,  is  satisfied: 

•  the  regularity  of  Q(-,9')  rely  in  an  obvious  way  on  the  existence  of  derivatives  with  respect 
to  9  of  logp*,  bf  and  hg, 

•  computing  the  corresponding  derivatives,  and  maximizing  Q(-,9')  will  not  involve  the 
computation  of  any  other  infinite-dimensional  object  such  as  a  conditional  density. 

Moreover,  as  was  pointed  out  in  [l],  there  are  particular  cases  in  which  the  M-step  can  be  dealt 
with  explicitely.  This  includes  the  case  where: 

•  logp®  depends  quadratically  on  9, 

•  bg  and  hg  depend  linearly  on  9, 

since  9  >-*  Q{9,9')  becomes  then  a  quadratic  form. 


According  t,o  (9)  and  (10),  Vl0Q(9,9')  belongs  to  the  class  of  conditional  expectations  con¬ 
sidered  in  Section  3,  provided: 

•  the  underlying  probability  is  Pgi, 

•  the  following  data  are  used: 


ft  = 

Po 

n»  = 


Of  =  -(V6,)'a-*(4#  -  bf)  -  (Vhg)-hg 
Xg  =  a-'Vfc, 
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In  particular: 


f«,'  +  1eh>'  =  -(V6,)'a  *(A,  -  6,>)  -  (VA, )'(/:,  -  A«<) 


The  approach  based  on  filtering  gives: 

V10Q(8,<>')  =  ~ir^ 

with  (uf  ;  0  <  t  <  T)  and  (u/f*  ;  0  <  t  <  T)  given  respectively  by  (30)  and  (see  (27)): 
dw**'  =  dt  +  h'$,w”'  dYt  +  (V/i #)*uf'  dYt  +  //uf'  dt 


-  [(V4#)*a-*(fc,  -  6,<)  +  (Vhs)‘(h(  -  h,,)]  uf  dt 


...*♦*  _  Po  V7„« 

**>0  —  ~T^Po 

Po 


Remark:  Comparing  with  (31)  one  can  check  that: 

V‘°Q(M')  !*=,•  =  VL(ff') 

as  expected. 

As  for  the  smoothing  approach,  one  can  use  again  the  results  of  Section  3.  Alternatively, 
one  can  directly  differentiate  with  respect  to  9  the  expression  (33)  for  Q(0,0'),  thus  illustrating 
the  point  [M],  Indeed: 

in  „  rtf.W)dY.  r  r  ,-■] 

V'°Q(9,9')  =  (*S,i£ 2)+Jl -  - +  {(Vb,yD  ^  ,**;)ds 

Po  lur>  *;  ni  . 


-  £{*>',  [(V4#)*o-»(6,  -  6,0  +  (Vfc,)-(A,  -  h,.j])ds 


6  Conclusion 


Two  different  approaches  have  been  investigated  for  the  MLE  of  partially  observed  diffusions. 
S<-me  formulas  given  in  jl)  have  been  clarified,  and  it  has  been  shown  that  smoothing  is  necesary 
to  make  the  EM  algorithm  approach  efficient.  On  the  other  hand,  formula  have  been  given  in 
terms  of  SPDE  for  the  computation  of  the  original  log-likelihood  ratio  and  its  gradient.  (As  might 
have  been  noticed,  expressions  related  to  the  direct  approach  are  given  in  terms  of  unnormalized 
conditional  densities,  whereas  in  the  EM  algorithm  approach  normalized  conditional  densities 
have  been  used). 

As  a  consequence,  it  does  not  appear  so  clearly,  except  for  some  particular  cases,  already 
considered  iB'lrf,  that  the  EM  algorithm  is  faster  than  the  direc-.  aproach.  This  should  be 
investigated  on  numerical  exemples.  .  ; 

V. 
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